<html>
<head><meta charset="utf-8"><title>Extracting the lexer · t-compiler/rust-analyzer · Zulip Chat Archive</title></head>
<h2>Stream: <a href="https://rust-lang.github.io/zulip_archive/stream/185405-t-compiler/rust-analyzer/index.html">t-compiler/rust-analyzer</a></h2>
<h3>Topic: <a href="https://rust-lang.github.io/zulip_archive/stream/185405-t-compiler/rust-analyzer/topic/Extracting.20the.20lexer.html">Extracting the lexer</a></h3>

<hr>

<base href="https://rust-lang.zulipchat.com">

<head><link href="https://rust-lang.github.io/zulip_archive/style.css" rel="stylesheet"></head>

<a name="162264211"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/185405-t-compiler/rust-analyzer/topic/Extracting%20the%20lexer/near/162264211" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> matklad <a href="https://rust-lang.github.io/zulip_archive/stream/185405-t-compiler/rust-analyzer/topic/Extracting.20the.20lexer.html#162264211">(Apr 01 2019 at 18:16)</a>:</h4>
<p>I’ve just realized that a relatively easy part of the compiler to extract would be the lexer. I imagine it could have a relatively pure interface which works for both rustc and rls2 (the current interface in rustc is very impure and, for example, depends on string interning). Note that in and of itself, extracting lexer would’t be benefitial: it’s a stable and easy part. But his would be a good precedent</p>



<a name="162264265"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/185405-t-compiler/rust-analyzer/topic/Extracting%20the%20lexer/near/162264265" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> matklad <a href="https://rust-lang.github.io/zulip_archive/stream/185405-t-compiler/rust-analyzer/topic/Extracting.20the.20lexer.html#162264265">(Apr 01 2019 at 18:17)</a>:</h4>
<p>Should we spend time on this?</p>



<a name="162419497"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/185405-t-compiler/rust-analyzer/topic/Extracting%20the%20lexer/near/162419497" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> detrumi <a href="https://rust-lang.github.io/zulip_archive/stream/185405-t-compiler/rust-analyzer/topic/Extracting.20the.20lexer.html#162419497">(Apr 03 2019 at 11:21)</a>:</h4>
<p>It might be a good starting point to get things to line up</p>



<a name="162419582"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/185405-t-compiler/rust-analyzer/topic/Extracting%20the%20lexer/near/162419582" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> detrumi <a href="https://rust-lang.github.io/zulip_archive/stream/185405-t-compiler/rust-analyzer/topic/Extracting.20the.20lexer.html#162419582">(Apr 03 2019 at 11:23)</a>:</h4>
<p>For details like string interning, should those line up as well, or can it be abstracted over?</p>



<a name="162419663"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/185405-t-compiler/rust-analyzer/topic/Extracting%20the%20lexer/near/162419663" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> detrumi <a href="https://rust-lang.github.io/zulip_archive/stream/185405-t-compiler/rust-analyzer/topic/Extracting.20the.20lexer.html#162419663">(Apr 03 2019 at 11:24)</a>:</h4>
<p>It feels like rustc does many things differently for performance, which makes it hard to extract. But maybe that's just a problem with the current implementation?</p>



<a name="162419850"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/185405-t-compiler/rust-analyzer/topic/Extracting%20the%20lexer/near/162419850" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> matklad <a href="https://rust-lang.github.io/zulip_archive/stream/185405-t-compiler/rust-analyzer/topic/Extracting.20the.20lexer.html#162419850">(Apr 03 2019 at 11:27)</a>:</h4>
<p>I hope there's a purely-functional interface to the lexer, which should achieve both  good perf and flexibility, required for IDE</p>



<a name="162419857"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/185405-t-compiler/rust-analyzer/topic/Extracting%20the%20lexer/near/162419857" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> matklad <a href="https://rust-lang.github.io/zulip_archive/stream/185405-t-compiler/rust-analyzer/topic/Extracting.20the.20lexer.html#162419857">(Apr 03 2019 at 11:28)</a>:</h4>
<p>In particular, I think the lexer could return string slices, and leave the interning to the calling code</p>



<a name="162420074"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/185405-t-compiler/rust-analyzer/topic/Extracting%20the%20lexer/near/162420074" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> detrumi <a href="https://rust-lang.github.io/zulip_archive/stream/185405-t-compiler/rust-analyzer/topic/Extracting.20the.20lexer.html#162420074">(Apr 03 2019 at 11:30)</a>:</h4>
<p>makes sense</p>



<a name="162515922"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/185405-t-compiler/rust-analyzer/topic/Extracting%20the%20lexer/near/162515922" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> matklad <a href="https://rust-lang.github.io/zulip_archive/stream/185405-t-compiler/rust-analyzer/topic/Extracting.20the.20lexer.html#162515922">(Apr 04 2019 at 11:38)</a>:</h4>
<p>So, I've looked closer at the lexer and I think we should do it. My plan is to produce an <code>rust_lexer</code> crate, which has zero/few dependencies (we might pull something in for XID_Start), with roughly the following interface:</p>
<div class="codehilite"><pre><span></span>pub enum LexerError {
    BareCRinComment { position: usize },
    ....
}

pub struct Token {
     kind: TokenKind,
     len: usize,
}

#[repr(u8)] // fieldless
pub enum TokenKind {
    Shebang,
    Comment,
    Whitespace,
    ....
    Error,
}

pub fn next_token(src: &amp;str, is_file_start: bool, errors: &amp;mut Vec&lt;LexerError&gt;) -&gt; Token {
    assert!(!src.is_empty());
    ....
}
</pre></div>


<p>RLS2 will consume the tokens and build up a concrete syntax tree.</p>
<p>rustc will consume the tokens and build the rich rustc tokens, with spans, interned identifiers, escaped strings, etc. </p>
<p>the crate will live in rust-lang/rust repo. Long term, I think we should add an <code>/libs</code> top-level dir, alongside with <code>/src</code>, where we store librarified compiler which you can build without <code>x.py</code>, LLVM and bootstraping. </p>
<p><span class="user-group-mention" data-user-group-id="1060">@WG-rls2.0</span> thoughts?</p>



<a name="162516045"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/185405-t-compiler/rust-analyzer/topic/Extracting%20the%20lexer/near/162516045" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> matklad <a href="https://rust-lang.github.io/zulip_archive/stream/185405-t-compiler/rust-analyzer/topic/Extracting.20the.20lexer.html#162516045">(Apr 04 2019 at 11:40)</a>:</h4>
<p>Note that immediate benefit from extracting the lexer would be negligible: it's a simple bit of code which rests with few modifications. However, I find it important to actually start the "from the ground up" librarification, and this seems like a good case .</p>



<a name="162519270"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/185405-t-compiler/rust-analyzer/topic/Extracting%20the%20lexer/near/162519270" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> detrumi <a href="https://rust-lang.github.io/zulip_archive/stream/185405-t-compiler/rust-analyzer/topic/Extracting.20the.20lexer.html#162519270">(Apr 04 2019 at 12:27)</a>:</h4>
<p>In the short term, having it live inside rust-lang/rust would probably mean that it requires <code>x.py</code> and bootstrapping, which is a bit too much overhead for such a small library. And for the long term, that'd mean that <code>x.py</code> would also have to call cargo to build the <code>/libs</code> crates, wouldn't that complicate the build process further?</p>



<a name="162519400"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/185405-t-compiler/rust-analyzer/topic/Extracting%20the%20lexer/near/162519400" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> detrumi <a href="https://rust-lang.github.io/zulip_archive/stream/185405-t-compiler/rust-analyzer/topic/Extracting.20the.20lexer.html#162519400">(Apr 04 2019 at 12:28)</a>:</h4>
<p>But I guess moving the lexer to a separate repo (and to <a href="http://crates.io" target="_blank" title="http://crates.io">crates.io</a>?) would make changes harder, since it would be spread across multiple repositories</p>



<a name="162519878"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/185405-t-compiler/rust-analyzer/topic/Extracting%20the%20lexer/near/162519878" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> matklad <a href="https://rust-lang.github.io/zulip_archive/stream/185405-t-compiler/rust-analyzer/topic/Extracting.20the.20lexer.html#162519878">(Apr 04 2019 at 12:35)</a>:</h4>
<p>Yeah, this plan is to publish it to crates,io, so that rust-analyzer can just pick it up. The /libs setup will complicate the build, but not too much x.py already uses cargo to build everything</p>



<a name="162558147"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/185405-t-compiler/rust-analyzer/topic/Extracting%20the%20lexer/near/162558147" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> matklad <a href="https://rust-lang.github.io/zulip_archive/stream/185405-t-compiler/rust-analyzer/topic/Extracting.20the.20lexer.html#162558147">(Apr 04 2019 at 19:09)</a>:</h4>
<p>proposed API: <a href="https://github.com/rust-lang/rust/pull/59706" target="_blank" title="https://github.com/rust-lang/rust/pull/59706">https://github.com/rust-lang/rust/pull/59706</a></p>



<hr><p>Last updated: Aug 07 2021 at 22:04 UTC</p>
</html>